可靠且稳定的6D姿势估计不合作空间对象在轨道维修和清除碎片清除任务中起着至关重要的作用。考虑到姿势估计器对背景干扰很敏感,本文提出了一个名为CaspaceNet的反事实分析框架,以完成复杂背景下的Spaceborne目标的稳健6D姿势估计。具体而言,采用常规方法在事实情况下提取整个图像的特征。在反事实情况下,不存在无目标的图像,但只想想象背景。反事实分析降低了由背景干扰引起的副作用,从而导致最终结果中的预测无偏见。此外,我们还对Ca-paceNet进行了低位宽度量化,并将部分框架部署到FPGA上的内存加速器(PIM)加速器上。定性和定量结果证明了我们提出的方法的有效性和效率。据我们所知,本文首次将因果推理和网络量化应用于6D姿势估计太空源目标。该代码可在https://github.com/shunli-wang/ca-pacenet上获得。
translated by 谷歌翻译
近年来,评估视频的行动质量引起了计算机视觉群落和人机互动中的不断关注。大多数现有方法通常通过直接从动作识别任务迁移模型来解决这个问题,这忽略了特征映射内的内在差异,例如前景和背景信息。为了解决这个问题,我们提出了一种用于行动质量评估(AQA)的管自我关注网络(TSA网)。具体地,我们将单个对象跟踪器引入AQA并提出了管自我关注模块(TSA),可以通过采用稀疏特征交互有效地产生丰富的时空上下文信息。 TSA模块嵌入在现有的视频网络中以形成TSA-Net。总体而言,我们的TSA-网具有以下优点:1)高计算效率,2)灵活性高,3)最先进的性能。在包括AQA-7和MTL-AQA的流行动作质量评估数据集上进行了广泛的实验。此外,提出了一个名为Fint识别的数据集(FR-FS),以探索花样滑冰场景中的基本动作评估。
translated by 谷歌翻译
仿制学习(IL)是一个框架,了解从示范中模仿专家行为。最近,IL显示了高维和控制任务的有希望的结果。然而,IL通常遭受环境互动方面的样本低效率,这严重限制了它们对模拟域的应用。在工业应用中,学习者通常具有高的相互作用成本,与环境的互动越多,对环境的损害越多,学习者本身就越多。在本文中,我们努力通过引入逆钢筋学习的新颖方案来提高样本效率。我们的方法,我们调用\ texit {model redion函数基础的模仿学习}(mrfil),使用一个集合动态模型作为奖励功能,是通过专家演示培训的内容。关键的想法是通过在符合专家示范分布时提供积极奖励,为代理商提供与漫长地平线相匹配的演示。此外,我们展示了新客观函数的收敛保证。实验结果表明,与IL方法相比,我们的算法达到了竞争性能,并显着降低了环境交互。
translated by 谷歌翻译
For sequence generation, both autoregressive models and non-autoregressive models have been developed in recent years. Autoregressive models can achieve high generation quality, but the sequential decoding scheme causes slow decoding speed. Non-autoregressive models accelerate the inference speed with parallel decoding, while their generation quality still needs to be improved due to the difficulty of modeling multi-modalities in data. To address the multi-modality issue, we propose Diff-Glat, a non-autoregressive model featured with a modality diffusion process and residual glancing training. The modality diffusion process decomposes the modalities and reduces the modalities to learn for each transition. And the residual glancing sampling further smooths the modality learning procedures. Experiments demonstrate that, without using knowledge distillation data, Diff-Glat can achieve superior performance in both decoding efficiency and accuracy compared with the autoregressive Transformer.
translated by 谷歌翻译
Visual odometry is crucial for many robotic tasks such as autonomous exploration and path planning. Despite many progresses, existing methods are still not robust enough to dynamic illumination environments. In this paper, we present AirVO, an illumination-robust and accurate stereo visual odometry system based on point and line features. To be robust to illumination variation, we introduce the learning-based feature extraction and matching method and design a novel VO pipeline, including feature tracking, triangulation, key-frame selection, and graph optimization etc. We also employ long line features in the environment to improve the accuracy of the system. Different from the traditional line processing pipelines in visual odometry systems, we propose an illumination-robust line tracking method, where point feature tracking and distribution of point and line features are utilized to match lines. In the experiments, the proposed system is extensively evaluated in environments with dynamic illumination and the results show that it achieves superior performance to the state-of-the-art algorithms.
translated by 谷歌翻译
Semantic segmentation usually benefits from global contexts, fine localisation information, multi-scale features, etc. To advance Transformer-based segmenters with these aspects, we present a simple yet powerful semantic segmentation architecture, termed as IncepFormer. IncepFormer has two critical contributions as following. First, it introduces a novel pyramid structured Transformer encoder which harvests global context and fine localisation features simultaneously. These features are concatenated and fed into a convolution layer for final per-pixel prediction. Second, IncepFormer integrates an Inception-like architecture with depth-wise convolutions, and a light-weight feed-forward module in each self-attention layer, efficiently obtaining rich local multi-scale object features. Extensive experiments on five benchmarks show that our IncepFormer is superior to state-of-the-art methods in both accuracy and speed, e.g., 1) our IncepFormer-S achieves 47.7% mIoU on ADE20K which outperforms the existing best method by 1% while only costs half parameters and fewer FLOPs. 2) Our IncepFormer-B finally achieves 82.0% mIoU on Cityscapes dataset with 39.6M parameters. Code is available:github.com/shendu0321/IncepFormer.
translated by 谷歌翻译
Despite recent progress on trajectory planning of multiple robots and path planning of a single tethered robot, planning of multiple tethered robots to reach their individual targets without entanglements remains a challenging problem. In this paper, we present a complete approach to address this problem. Firstly, we propose a multi-robot tether-aware representation of homotopy, using which we can efficiently evaluate the feasibility and safety of a potential path in terms of (1) the cable length required to reach a target following the path, and (2) the risk of entanglements with the cables of other robots. Then, the proposed representation is applied in a decentralized and online planning framework that includes a graph-based kinodynamic trajectory finder and an optimization-based trajectory refinement, to generate entanglement-free, collision-free and dynamically feasible trajectories. The efficiency of the proposed homotopy representation is compared against existing single and multiple tethered robot planning approaches. Simulations with up to 8 UAVs show the effectiveness of the approach in entanglement prevention and its real-time capabilities. Flight experiments using 3 tethered UAVs verify the practicality of the presented approach.
translated by 谷歌翻译
While feature association to a global map has significant benefits, to keep the computations from growing exponentially, most lidar-based odometry and mapping methods opt to associate features with local maps at one voxel scale. Taking advantage of the fact that surfels (surface elements) at different voxel scales can be organized in a tree-like structure, we propose an octree-based global map of multi-scale surfels that can be updated incrementally. This alleviates the need for recalculating, for example, a k-d tree of the whole map repeatedly. The system can also take input from a single or a number of sensors, reinforcing the robustness in degenerate cases. We also propose a point-to-surfel (PTS) association scheme, continuous-time optimization on PTS and IMU preintegration factors, along with loop closure and bundle adjustment, making a complete framework for Lidar-Inertial continuous-time odometry and mapping. Experiments on public and in-house datasets demonstrate the advantages of our system compared to other state-of-the-art methods. To benefit the community, we release the source code and dataset at https://github.com/brytsknguyen/slict.
translated by 谷歌翻译
近年来,由渠道状态信息(CSI)启用了基于WiFi的智能人类传感技术(CSI)。但是,在不同的环境中部署时,基于CSI的传感系统会遭受性能降解。现有作品通过使用新环境中的大量未标记的高质量数据来通过域的适应来解决这一问题,这在实践中通常不可用。在本文中,我们提出了一种新颖的增强环境不变的鲁棒wifi wifi识别系统,名为Airfi,该系统从新的角度涉及环境依赖问题。 Airfi是一个新颖的领域泛化框架,无论环境如何,都可以学习CSI的关键部分,并将模型推广到看不见的场景,不需要收集任何数据以适应新环境。 Airfi从几个培训环境环境中提取了共同的功能,并最大程度地减少了它们之间的分布差异。该功能将进一步增强,以使环境更强大。此外,可以通过几次学习技术进一步改进该系统。与最先进的方法相比,Airfi能够在不同的环境环境中工作,而无需从新环境中获取任何CSI数据。实验结果表明,我们的系统在新环境中保持强大,并优于比较系统。
translated by 谷歌翻译
行业分配根据预定义的行业分类系统(ICS)将公司分配给行业,这对于大量关键业务实践至关重要,从公司的运营和战略决策到政府机构的经济分析。三种专家知识对于有效行业分配至关重要:基于定义的知识(即每个行业的专家定义),基于结构的知识(即ICS中指定的行业之间的结构关系)和基于任务的知识(即,域专家执行的事先公司行业任务)。现有的行业分配方法仅利用基于任务的知识来学习将未分配的公司分类为行业的模型,并忽略基于定义和基于结构的知识。此外,这些方法仅考虑已分配了公司的哪个行业,但忽略了基于分配的知识的时间特异性,即在任务发生时。为了解决现有方法的局限性,我们提出了一种新颖的基于深度学习的方法,该方法不仅无缝整合了三种类型的行业分配知识,而且还考虑了基于分配的知识的特定时间。从方法上讲,我们的方法具有两种创新:动态行业表示和分层分配。前者通过通过我们提出的时间和空间聚集机制整合了三种类型的知识,将行业代表为一系列特定时间的向量。后者将行业和公司的表现作为投入,计算将公司分配给不同行业的可能性,并将公司分配给具有最高概率的行业。
translated by 谷歌翻译